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MERGING FOR INHOMOGENEOUS FINITE MARKOV CHAINS, 
PART II: NASH AND LOG-SOBOLEV INEQUALITIES 

By L. Saloff-Coste 1 and J. Zuniga 2 
Cornell University and Stanford University 

We study time-inhomogeneous Markov chains with finite state 
spaces using Nash and logarithmic-Sobolev inequalities, and the no- 
tion of c-stability. We develop the basic theory of such functional 
inequalities in the time-inhomogeneous context and provide illustrat- 
ing examples. 

1. Introduction. 

1.1. Background. This article is part of a series of works where we study 
quantitative merging properties of time inhomogeneous finite Markov chains. 
Time inhomogeneity leads to a great variety of behaviors. Moreover, even in 
rather simple situations, we are at a loss to study how a time inhomogeneous 
Markov chain might behave. Here, we focus on a natural but restricted type 
of problem. Consider a sequence of aperiodic irreducible Markov kernels 
(-PQ)f° on a finite set V . Let tti be the invariant measure of Ki. Assume 
that, in a sense to be made precise, all Ki and all 7Tj are similar and the 
behavior of the time homogeneous chains driven by each Ki separately is 
understood. Can we then describe the behavior of the time inhomogeneous 
chain driven by the sequence (Ki)^°? 

To give a concrete example, on Vn = {0, . . . , N}, consider a sequence of 
aperiodic irreducible birth and death chain kernels Ki, i = 1, 2, . . . , with 



l/4<Ki(x,y)<3/4 i£\x-y\<l 
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and with reversible measure 7Tj satisfying 1/4 < (N + l)7Tj(x) < 4, for all 
x € Vjv- What can we say about the behavior of the corresponding time 
inhomogeneous Markov chain? 

Remarkably enough, there is very little known about this question. What 
can we expect to be true? What can we try to prove? Let Ko^ n (x, •) denote 
the distribution, after n steps, of the time inhomogeneous chain described 
above started at x. It is not hard to see that such a chain satisfies a Doeblin 
type condition that implies 

lim \\K 0n (x,-) - K n (y,')\hv = 0. 

n— yoo 

In the absence of a true target distribution and following [4], we call this 
property merging. Of course, this does not qualify as a quantitative result. 
Extrapolating from the behavior of each kernel Ki taken individually, we 
may hope to show that, if Ivm^ n / N 2 = oo then 

lim \\K otN (x, •) - K otN (y, Oil TV = 0. 

N— >oo 

The aim of this paper and the companion paper [32] is to present tech- 
niques that apply to this type of problem. The simple minded problem out- 
lined above is actually quite challenging and we will not be able to resolve it 
here without some additional hypotheses. However, we show how to adapt 
techniques such as singular values, Nash and log-Sobolev inequalities to time 
inhomogeneous chains and provide a variety of examples where these tools 
apply. In [32], we discussed singular value techniques. Here, we focus on 
Nash and log-Sobolev inequalities. The examples treated here (as well as 
those treated in [32, 33]) are quite particular despite the fact that one may 
believe that the techniques we use are widely applicable. Whether or not such 
a belief is warranted is a very interesting and, so far, unanswered question. 
This is deeply related to the notion of c-stability that is introduced here and 
in [32]. The examples we present here and in [30, 32, 33] are about the only 
existing evidence of successful quantitative analysis of time inhomogeneous 
Markov chains. 

A more detailed introduction to these questions is in [32]. The references 
[17, 30] discuss singular value techniques in the case of time inhomogeneous 
chains that admit an invariant distribution [all kernels Ki in the sequence 
(i^i)f° share a common invariant distribution]. Time inhomogeneous ran- 
dom walks on finite groups provide a large collection of such examples (see 
also [24] for a particularly interesting example: semirandom transpositions). 
The papers [7, 14] are also concerned with quantitative results for time in- 
homogeneous Markov chains. In particular, the techniques developed in [7] 
are closely related to ours and we will use some of their results concerning 
the modified logarithmic Sobolev inequality. References on the basic theory 
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of time inhomogeneous Markov chains are [19, 26, 35-37]. For a different 
perspective, see also [3]. 

A short review of the relevant aspects of the time inhomogeneous Markov 
chain literature, including the use of "ergodic coefficients" can be found in 
[34] . The vast literature on the famous simulated annealing algorithm is not 
very relevant for our purpose but we refer to [6] for a recent discussion. The 
paper [5] concerned with filtering and genetic algorithms describes problems 
that are related in spirit to the present work. 

1.2. Basic notation. Let V be a finite set equipped with a sequence of 
kernels (K n )f such that, for each n, K n (x,y) > and ^2, y K n {x,y) = 1. An 
associated Markov chain is a F-valued random process X = (X n )o° such 
that, for all n, 

P(X n = y\X n _ 1 = x,...,X = x ) = P(X n = y\X n _ 1 = x) 

= K n (x,y). 

The distribution fi n of X n is determined by the initial distribution //q and 
given by 

^n{y) = ^2^{x)K , n (x,y), 

x&V 

where K n ^ m (x,y) is defined inductively for each n and each m > n by 
K n ,m(x,y) = K^m-^x, z)K m (z,y) 

with K n ^ n = I (the identity). If we interpret the K n 's as matrices, then this 
definition means that K n ^ m = K n+ \ ■ ■ ■ K m . This paper is mostly concerned 
with the behavior of the measures Ko tn (x, •) as n tends to infinity. In the case 
of time homogeneous chains where all Ki = Q are equal, we write -Ko,n = Q n '■ 
Our main interest is in ergodic like properties of time inhomogeneous 
Markov chains. In general, one does not expect fi n = hqKq^ to converge 
toward a limiting distribution. Instead, the natural notion is that of merging 
of measures as discussed in [4]. 

Definition 1.1. Fix a sequence of Markov kernels as above. We say the 
sequence is merging if for any x, y, z G V, 

(1.1) lim K n (x,z) - K n (y,z) = 0. 

n— yoo 

Remark 1.2. If the sequence (-fQ)i° is merging then, for any two start- 
ing distributions ijlq,vq, the measures [i n = ^qKq^ and v n = vqKq :TI are merg- 
ing, that is, fi n — v n — > 0. Since we assume the set V is finite, merging is 
equivalent to Um n _» co ||.Ko,»OE> •) — Ko,n(y, •) Htv = 0. Hence, we also refer to 
this property as "total variation merging." 
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Total variation merging is also referred to as weak ergodicity in the liter- 
ature and there exists a body of work concerned with understanding when 
weak ergodicity holds. See, for example, [19, 25-27, 35]. A main tool used to 
show weak ergodicity is that of contraction coefficients. Furthermore, in [16], 
Birkhoff's contraction coefficient is used to study ratio ergodicity which is 
equivalent to what we will later call relative-sup merging. However, it should 
be noted that even for time homogeneous chains Birkhoff coefficients and 
related methods fail to provide useful quantitative bounds in most cases. 

Our goal is to develop quantitative results in the context of time inhomo- 
geneous chains in the spirit of the work of Aldous, Diaconis and others. In 
these works, precise estimates of the mixing time of ergodic chains are ob- 
tained. Typically, a family of Markov chains indexed by a parameter, say N, 
is studied. Loosely speaking, as the parameter TV" increases, the complexity 
and size of the chain increases and one seeks bounds that depend on N in 
an explicit quantitative way. See, for example, [1, 2, 8-13, 15, 22, 23, 28]. 
Efforts in this direction for time inhomogeneous chains are in [7, 14, 16- 
18, 24, 30, 32]. Still, there are only a very small number of results and 
examples concerning the quantitative study of merging as defined above for 
time inhomogeneous Markov chains so that it is not very clear what kind of 
results should be expected and what kind of hypotheses are reasonable. We 
refer the reader to [32] for a more detailed discussion. 

The following definition is useful to capture the spirit of our study. It 
indicates that the simplest case we would like to think about is the case 
when the sequence K{ is obtained by deterministic but arbitrary choices 
between a finite number of kernels Q = {Qi, ■ ■ ■ , Qk}- 

Definition 1.3. We say that a set Q of Markov kernels on V is merging 
in total variation if for any sequence (-fQ)o° with Ki £ Q for all i, we have 

Mx,y,z£ V lim \\K 0n (x,-) - Ko n (y, -)||tv = 0. 

n— ¥oo 

In the study of ergodicity of finite Markov chains, the convergence toward 
the target distribution is measured using various notions of distance between 
probability measures. These include the total variation distance 



the chi-square distance (w.r.t. v. Note the asymmetry between \x and v.) 



H - u\\ TY = sup{fi(A) - v{A)} 

AcV 




and the relative sup-distance (again, note the asymmetry) 
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These will be used here to measure merging. 

1.3. Merging time. In the quantitative theory of ergodic time homoge- 
neous Markov chains, the notion of mixing time plays a crucial role. For 
time inhomogeneous chain, we propose to consider the following definitions. 



Definition 1.4. Fix e G (0, 1). Given a sequence (Ki)™ of Markov ker- 
nels on a finite set V, we call max total variation merging time the quantity 

Ttv(£) = mf\n: max.\\K 0n (x,-) - K 0n (y, -)||tv <e\- 
(. %,y&V ) 

Definition 1.5. Fix e G (0, 1). We say that a set Q of Markov kernels 
on V has max total variation e-merging time at most T if for any sequence 
{Ki)f with Ki G Q for all i, we have T Ty (e) < T, that is, 

Vt > T max{\\K 0tt (x, •) - K 0jt (y, -)||tv} < e. 

x,y€V 

Of course, merging can be measured in ways other than total variation. 
Also merging is a bit less flexible than mixing in this respect since there 
is no reference measure. One very natural and much stronger notion than 
total variation is relative sup-distance. For time inhomogeneous chains, total 
variation merging does not necessarily imply relative-sup merging as defined 
below. See [32]. 

Definition 1.6. We say a sequence (Ki)f of Markov kernels on a finite 
set V is merging in relative-sup if for all x,y,z €V 

K 0n (x,z) 
lim 1? — 1 V = 1 

with the convention that 0/0 = 1 and a/0 = oo for a > 0. Fix e G (0, 1), we 
call relative-sup merging time the quantity 

?oo (c) = inf I n : max 

Definition 1.7. We say a set Q of Markov kernels on V is merging 
in relative-sup if any sequence (i£i)f° with Ki G Q for all i is merging in 
relative-sup. 

Fix e G (0, 1). We say that Q has relative-sup e-merging time at most T 
if for any sequence with Ki G Q for all i, we have T 0O (e) < T, that is, 

\/t>T max 

x,y,z£V 



K Q ^(x,z) 
K Jy,z) 




Ko !t (x,z) 
KoAy,z) 



< e. 
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The following problem is open. It is a quantitative version of the problem 
stated at the beginning of the introduction. 

Problem 1.8. Let Vn = {0, . . . , N} and c G [l,oo). Let Q N be the set 
of all birth and death chains Q on Vn with Q(x,y) G [1/4,3/4] if \x — y\ < 1, 
and reversible measure tt satisfying 1/4 < (N + l)ir(x) < 4, x G Vn- 

1. Prove or disprove that there exists a constant A independent of N such 
that Qn has total variation e- merging time at most AN 2 (1 + log + 1/e). 

2. Prove or disprove that there exists a constant A independent of N such 
that Qtv has relative-sup e- merging time at most AN 2 (I + log + 1/e). 

Remark 1.9. This problem is open (in most cases) even if one considers 
a sequence (i£i)f° drawn from a set Q = {K of two kernels. Observe 
that the hypothesis that the invariant measures 7Tj are all comparable to the 
uniform plays some role. How to harvest the global hypothesis of comparable 
stationary distributions 7Tj is not entirely clear. See Theorem 1.14 below for 
a partial solution. 

If tti and 7T2 are not comparable, it is possible for (Ki,tti) and (-^2,^2) 
to have the same mixing time yet for Q = {K±,K2} to have a merging time 
of a higher order. Assume that K\ and K<i are two biased random walks 
with equal drift, one drift to left, the other to the right. Despite the fact 
that each of these random walks has a relative-sup mixing time of order 
N, the inhomogeneous chain driven by the sequence K\ K2K1K2 ■ ■ ■ has a 
relative-sup merging time of order N 2 , see [32]. 

1.4. Stability. In this section, we consider a property, c-stability, that 
plays a crucial role in the techniques we develop to provide quantitative 
bounds for time inhomogeneous Markov chains. This property was intro- 
duced and discussed in [32]. It is a straightforward generalization of the 
property of sharing the same invariant measure. Unfortunately, it is hard to 
check. 

Definition 1.10. Fix c > 1. A sequence of Markov kernels (K n )f on a 
finite set V is c-stable if there exists a measure no such that 

(1.2) Vn>0,xeV c~ l <^\<c, 

where fx n = hqKq jH . If this holds, we say that {K n )f is c-stable with respect 
to the measure ^o- 

Definition 1.11. A set Q of Markov kernels is c-stable with respect to 
a measure fj,o if any sequence (-fQ)f 3 such that Ki G Q for all i is c-stable 
with respect to (j,q. 
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Remark 1.12. If all Ki share the same invariant distribution ir then 
(Ki)^° is 1-stable with respect to ir. 

Remark 1.13. Suppose a set Q of aperiodic irreducible Markov kernels 
is c-stable with respect to a measure ixq. Let 7r be an invariant measure for 
some Q £ Q. Then we must have 

xeV, - < — < c. 

c /i (x) 

Hence, Q is also c 2 -stable with respect to tt and any two invariant measures 
7r,7r' for kernels Q,Q' £ Q must satisfy 

1 7r(x) 9 

xGF, -<^-<c 2 . 

The following theorem which relates to a special case of Problem 1.8 
illustrates the role of c-stability. 

Theorem 1.14. Let Vn = {0, . . . ,7V}. Let Q N be the set of all birth and 
death chains Q on Vn with 

Q(x,y)e [1/4,3/4] if\x-y\<l 

and reversible measure it satisfying 1/4 < (N + l)ir(x) < 4, x £ Vn- Let 
{Ki)^ be a sequence of birth and death Markov kernels on Vn with Ki G Qn- 
Assume that (Ki)^° is c-stable with respect to the uniform measure on Vn, 
for some constant c > 1 independent of N . Then there exists a constant 
A = A{c) (in particular, independent of N ) such that the relative-sup merg- 
ing time for {Ki)f on Vn is bounded by 

Too(e)<AiV 2 (l + log + l/ e ). 

This will be proved later in a stronger form in Section 2.4. In [32] the 
weaker conclusion T 00 (e) < AN 2 {log N + log + 1/e) was obtained using sin- 
gular value techniques. Here, we will use Nash inequalities to obtain T OD (e) < 
AN 2 (l + log+l/e). 

It is possible that the set Qn is c-stable with respect to the uniform 
measure for some c. Indeed, it is tempting to conjecture that this is the 
case although the evidence is rather limited (see also the discussion in [34]). 
If this is true, then Theorem 1.14 solves Problem 1.8. However, we do not 
know how to approach the problem of proving c-stability for Qn- 

Remark 1.15. While the assumption of c-stability in Theorem 1.14 is 
quite strong, Sections 4.2 and 5 of [32] give specific examples of families Qn 
for which it holds. Further, we note that the question of whether or not 
c-stability holds is extremely natural and interesting in itself. 
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2. Singular values and Nash inequalities. One key idea in the study of 
Markov chains is to associate to a Markov kernel K the operator K : f i— > 
Kf = ^2 y K(-,y)f(y). In the case of time homogeneous chains, one uses the 
basic fact that this operator acts on £ p (ir) with norm 1 when ir is an invariant 
measure. 

In the case of time inhomogeneous chains, it is crucial to consider K as 
an operator between £ p spaces with different measures in the domain and 
target spaces. The following simple observation is key. 

Given a measure \i and a Markov kernel K on a finite set V, set \j! = fiK. 
Fix p G [1, oo) and consider if as a linear operator 

(2.1) /f = /f,:%V%), Kf(x) = Y / K(x,y)f(y). 

y 

Then 

(2.2) \\K\\ iJ>{ ^ m = supfll-KYH^) : / G F(/A 11/11*0*') <!} = !• 

This follows from Jensen's inequality. See, for example, [7, 32]. We will use 
the notation whenever we need to emphasize the fact that K is viewed 
as an operator between £ p (fiK) and £ q (n) for some 1 <p,q< oo. When the 
context is clear, we will drop the subscript [i as was done above. 

2.1. Using various distances. Given a sequence of Markov kernels (Ki)^° , 
fix a starting measure [J,q and set fj, n = iiqKq^. We will assume that fi n > 
for all n. Note that if fiQ > and K n are all irreducible then [i n > for all 
n > 0. We are interested in the behavior of 



dp(K 0>n (x,-),tJ>n) = ( 



K ,n(x,y) 



1 



Mn(y) 



X/p 



P>1. 



For p > 1, a classical argument involving the duality between £ p and £ q where 
1 = 1/p + 1/q, yields 



dp(Ko >n (x,-),fJ>n) =sup 



^2[K 0>n {x,y)f(y) -Hn(y)f{y)] 



< 1 



and one checks that the function 

is nonincreasing (see [32]). Of course, 

2\\K ^{x 1 -) - HuWtv = di(Ko !n (x,-),n n 
and, if 1 < p < r < oo, 

dp(K 0jTl (x,-),fj, n ) < d r (K 0tn (x,-),n n ). 
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In particular, 

(2.3) 2\\K 0in (x,-) -fJ, n \\TV <d2(Ko )n (x,-),fjL n ) 

and 
(2.4) 

Further, if 

K 0>n (x,z) 



K n (x,-) -K n (y, ^Htv <ma,x{d 2 (K 0n (x,-),fi n )}. 

x&V 



max 

x.z 



then 



max 

x,y,z 



K 0jn (x,z) 



Ko, n {y,z) 



1 



<£<l/2, 



<4e. 



To see the last inequality, note that iil — e<a/b,c/b< 1 + e with e G (0, 1/2) 
then 

1 — e a 1 + e 

1 - 2e < < - < < 1 + 4e. 

1 + e c 1 — e 

2.2. Singular values. In [32], we developed basic inequalities for d 2 (Ko tTl (x, 
•),Hn) based on singular value decompositions. The basic fact here is that, 
if fj, is a probability measure on V, K a Markov kernel and // = \xK then 



d 2 (K(x,-),n 



l\2 



\V\-1 

i=i 



where <7j, i = 0, . . . , | V| — 1, are the singular values of : £ (//') — >• t (//) in 
nonincreasing order, that is the square root of the eigenvalues of K^K* : £ 2 (n) ■ 
£ 2 (fi) where K*:£ 2 (n) £ 2 (fJ,') is the adjoint of : £ 2 {p!) -»• £ 2 (/j). The 
t/>j's form an orthonormal basis for ^ 2 (/i) and are eigenfunctions of K^K*, 
■ipi being associated with a 2 . Of course, the of's can also be viewed as the 
eigenvalues of K*^ : £ 2 (/) -»• ^ 2 (//). 

In any case, a crucial fact for us here is that a\, the second largest singular 
value of : £ 2 {\j!) — > £ 2 {n), is also the norm of K — // = — // : £ 2 (^') ->■ 
£ 2 (fJ,), that is, 

sup{||(^ - /x')/||*V) : / G ^V), = 1} = (Tx- 

Given a sequence (Ki)^° of Markov kernels on F and a positive measure 
/xo, set fj, n = fioKo )n and let a±(Ki, /ij-i) be the second largest singular value 
of Ki:£ 2 (m) -> £ 2 (m-i). Noting that 



Mr; 
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we obtain 

n 

( 2 - 5 ) \\ K 0,n ~ (JviWpfjlrJ-tPfjM)) <Y[vi( K htH-l)- 

1 

This inequality seems very promising and this is rather misleading. There 
is very little hope to compute or estimate the singular values erj(.fQ,//j_i), 
even if we have a good grasp on the kernel Ki. The reason is that ai(Ki, fa—i) 
depends very much on the unknown measure /Uj_i. This is similar to the 
problem one faces when studying an irreducible aperiodic time homogeneous 
finite Markov chain for which one is not able to compute the stationary 
measure (although this case is rarely discussed, it is the typical case). For 
positive examples and a more detailed discussion, see [32]. 

2.3. Dirichlet forms. Given a reversible Markov kernel Q with reversible 
measure tt on a finite set V, the associated Dirichlet form is 

£UJ) = £ Q AfJ) = ((i-Q)fJ)n 

= l^\n*)-m\M*)Qfav)- 

This definition is essential for the techniques considered in this paper. To 
illustrate this, we note that the singular value oi (if^ , //) associated to a 
Markov kernel K and a positive probability measure [i is the square root 
of the second largest eigenvalue of K*K fl :£ 2 (fi') — > £ 2 (n'), fi' = fJ,K. This 
operator is associated with the Markov kernel 

p (x,y) = — ^y2n(z)K(z,x)K(z,y), 

which is reversible with respect to // and has associated Dirichlet form 

£p,AfJ) = \ E - f(y)\ 2 Kz)K(z,x)K(z,y). 

x,y,z 

Hence, using the classical variational formula for eigenvalues, we have 

1 - o\ (K, /i) = inf | %^7^ = / G ? M , Va V (/) + } , 
where Va V (/) = - »'(f) 2 = EJ/(*) " Af)?^)- 

2.4. Nash inequalities. The use of Nash inequalities to study the con- 
vergence of ergodic (time homogeneous) finite Markov chains was developed 
in [11] (Section 7 of [11] discusses time homogeneous chains that admits an 
invariant measure). We refer the reader to that paper for background on 
this technique. In this section, we observe that it can be implemented in the 
context of time inhomogeneous chains. We start with some basic material. 
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Definition 2.1. Let V be a state space equipped with a Markov kernel 
K and probability measures \i and v. If 1 < p, q < oo then 

11-^11^(^)^9(1/)= SU P {\\ K f\Ul(v)}- 

\\f\\ep(ti)<i 

If p and q are conjugate exponents, that is, if 1/p + 1/q = 1, then 

11/11*0*)= SU P {(/>5)mI- 

llffll,£9(/i)<l 

The following proposition is well known in a much more general context. 

Proposition 2.2. Let K be a Markov kernel. Let K^:f(^iK) ->■ ^ 2 (/x) 
6e i/ie Markov operator on V with adjoint K* :^ 2 (/u) — > £ 2 (/j.K) with respect 
to the inner product 

(Kf,g) ll = (f,K*g) flK . 
If ' 1 <P,r,s < oo, 1/p + 1/q = 1 and l/r + l/s = l then 
\\^\\ip(tj,K)^e r {fi) = \\K Wt a (p)-tei(pK)- 

Let now be a sequence of Markov kernels on V. Fix a positive 

probability measure no and set fi n = [XqKq^ as usual. Consider K{ : £ 2 (fJ-i) — > 
f(LH-i), its adjoint K* : ^-i) -> £ 2 ( W j and P { = K* K { : f (fa) ^ f . 
The operator Pj is given by the Markov kernel 

(2.6) Pi{x,y) = — r ^y2fii- 1 (z)K i (z,x)K i (z,y). 

This kernel is reversible with reversible measure We let 
^, w (/,/) = ^|/(x)-/(y)|Vi(^(^y) 

be the associated Dirichlet form on l 2 (/J,i). 

Theorem 2.3. Referring to the setup and notation introduced above, let 
N > 1 and assume that there are constants C, D > such that for 1 < m < N 
the following Nash inequalities hold 

V/: V M ||/||%2 < cUp^if, f) 

(2-7) 
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Then, for < m < n < N , 

(ACB \ D 
ra _ m+ 1 ) ' 

where B = B(D,N) = (1 + 1/JV)(1 + \4D]). 

Proof. Let (-fQ)o° be a sequence of Markov kernels on V such that the 
Nash inequalities (2.7) hold. Pick a function / such that H/H^i^) = 1- For 
1 < m < n < N define 



t 



n (n-m) = \\K m , n f\\ 2 p {lJim y 

Note that for any n > 0, (i n (0)?=o * s nonincreasing. Indeed, using the con- 
traction property (2.2), we have 

t n (i + 1) = ||if n _i_i )n /||| 2 ( (Un _._ 1 ) = WKn-iKn-i^fWp^^.^) 

Moreover, note that for any < i — 1 < n < iV 

t n (i) l+l/{2D) < C(t n (i) - t n (i + 1) + t n (i)/N), 

where C and D are the constants in (2.7). This follows by applying the Nash 
inequality to the function K n -i >n f. Corollary 3.1 of [11] then yields that 

CB X 2D 



t n (i) < ( t^— j- ) , < % < n < N, 



where B = B(D,N) = (1 + l/N)(l + \4D] ). In particular, if < m< n < N, 

n\\ei^ n )^^ m )<((CB)/(n-m + l)) . 
From Proposition 2.2 it follows that, for < m < n < N, 

II^C, J^MmH^tMn) - i( CB )/( n ~ m + 1 )) D - 

Next we bound ||ifronll£ 1 (Mm)->£°°(Mn) for < m < n < iV. Consider the quan- 
tity M (N) where 

M W = ^ TD $ X > »r"t ^ ~ 171 + 1 ) 2D 1 1 K *m,n Wt?- (Mm )-+t°° (Jin ) } ■ 
0<rn<n<N ^ ' 

Let I = + m, so that 0<m<l<n<N. We have 

1 1 K m,n 1 1 t 1 (Mm )-^°° (Mn ) - 1 1 ^m,Z 1 1 t 1 {iMn (w ) 1 1 1 1 1 2 (w )-^°° (Mn ) 
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Note that for all < m < I < N 

/n qi II iv* II <r li xc* II 1 / 2 II 1 / 2 

l z - y J H^m./II^C/UmH^Cw) - ll- n -m,/IUi( Airo )^<x»( w )H- n -m,«ll^i( Mm )->^i( w )- 
This follows from the fact that for any function / 

\\ K m,lf\\£ 2 (^) < \\ K m,lf\\\£^ l )\\ K m,lf^\\^l lJLl y 

By (2.2), we have 

\K* II <r ( —£^—\ D ii 1 / 2 



< f -^M(iV) 1 / 2 

<( , ACB „ ) D M(N)V*. 
-\(n-m + l) 2 ) V ; 

The last inequality follows from the fact that 

n — m + 1 n—m+1 

n — I + 1 > and I — m + 1 > . 

2 2 

So we have M (iV) < (ACB) 2D and it follows that for all < m < n < N 



\ K m,n\\t 1 (jJ m )->e° (jin) - 



4CB 



n — m + 1 



By duality, we get that 



\ K m,n\\e^(^ n )^e^^ m ) < 



ACB 



n — m + 1 



Next, we use the Riesz-Thorin interpolation theorem, see [38], page 179, 
which gives us the desired result. □ 

The next results show how Theorem 2.3 together with the singular value 
technique of Section 2.2 yields merging results. 

Theorem 2.4. Referring to the above setup and notation, let N > 1 and 
assume that there are constants C, D > such that for 1 < m < N the Nash 
inequalities 

V/ :V -+ R 11/11^ < cfs Pm> , m (f, f) + ^||/||,V m) ) 

(2.10) 

l/D 
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hold. Let o~i(K m , fj, m -i) be the second largest singular value of K m : £ 2 (fi m ) - 
£ 2 (fi m -.i), that is, the square root of the second largest eigenvalue of P n 
Then, for n> m, N > m > 0, we have 



(2.11) MKo,nM,t^) < ( ^^li^ Y II °l(Ki,Hi~l)- 



m+l 



Moreover, for any n = 2m + u, < m < N , we have 



(2.12) max 



K , n (x,y) 



Vn{y) 



H (mil) ) n^.^o. 

m+l 



Proof. We have 

max{d 2 (ifo,n(xr),^n) 2 } = \\Ko,n ~ l i n\\%(p n )-H«>(ji )> 

where /i n , is understood as the expectation operator / i-+ fJ, n (f)- Moreover, 
for any < m < n, 

because K 0jm fj, n f = K 0jm fj, n (f) = /i n (/). Hence, for < m < N, 

/ \ \ 2 1 1 1 1 2 II 1 1 2 



< 



H-lj 



ACB 

771+1 



2D 



[ ai(Ki,m 
\m+l 

Using £ = A _1 (A + 1)(1 + |~4£>~|), gives (2.11). To obtain the stronger result 
(2.12), write 

Ko, n {x,y) 



max 

x,y£V 



0,n — f J "n\\eT-( fln )^e°°( ll0 ) 



and 



I A", 



0,n — l J 'n\\p-( l i, n )^-£°°( l i, ) 

< \\^n-m,n\\e 1 (^„)^e 2 (ii n - m ) x \\K m ,n-m ~ ^n-m\\p{ t _ Ln _ m )^ f p( t i m ) 

The stated bound (2.12) follows. □ 

Just as we did for singular values, let us emphasize that the powerful 
looking results stated in this theorem are actually extremely difficult to 
apply. Again, the point is that the Dirichlet form £p miflm , the space l 2 (n m ), 
and the singular values a\[K m , /i m -i) ah involve the unknown sequence of 

measures fi n = /xoAo >n , n = 0, The following subsection gives similar but 

more applicable results under additional hypotheses involving the notion of 
c-stability. 
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2.5. Nash inequality under c-stability. We state two results that parallel 
Theorems 5.9 and 5.10 of [32]. 

Theorem 2.5. Fix cG (l,oo). Let (i£i)f° be a sequence of irreducible 
Markov kernels on a finite set V . Assume that {Ki)f is c-stable with re- 
spect to a positive probability measure fiQ. For each i, set fi l = [i§Ki and let 
a(Ki,no) be the second largest singular value of Ki = K \ „ as an operator 
from £ 2 (fi l ) to £ 2 (fio). Let Pf = K* Ki jlM) . Let N > 1 and assume that there 
are constants C, D > such that for 1 < m < N the Nash inequalities 



Vf:V- 



(2.13) 



l/D 



holds. Then, for n> m, N > m > 0, we have 

/ 8C(?+ 3 / 2D {l+\4D]) \ D 

(2,4) 1 > 



d 2 (K 0tn (x, -),Mn) < 



m+1 



Moreover, for any n = 2m + u, < m < N , we have 



max 



< 



/ 8 Cc 2+ V 2D (l+\AD]) \ 2D 
V (rn + 1) J 

m+u / 1 / „ x2 \ 1/2 

1 - a(Ki,n Q y 



n i 



m+l 



Proof. First note that since /Xi_i//xo £ [1/c, c], we have Hq/^i G [1/c, c]. 
Consider the operator P, with kernel 

By assumption 

Hi(x)Pi(x,y) > c"Vo( x ) 



where the term in brackets on the right-hand side is the kernel of P® . This 
kernel has second largest eigenvalue o~(Ki, /io) 2 ■ A simple eigenvalue com- 
parison argument yields 



1 - ax(Ki,Hi-xf > - a(Ki,n ) 2 ). 
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Further, comparison of measures and Dirichlet form yields the Nash inequal- 
ity 

V/ : V -+ R 11/11^ < Cc^ D (Sp^if, f) + ^11/11^)) 
Together with Theorem 2.4, this gives the stated result. □ 



The next result is based on a stronger hypothesis. 



Theorem 2.6. Fix c E (l,oo). Let Q be a family of irreducible aperiodic 
Markov kernels on a finite set V . Assume that Q is c-stable with respect to 
some positive probability measure no. 

Let (-fQ)f° be a sequence of Markov kernels with Ki € Q for all i. Let 7Tj 
be the invariant measure of Ki. Let Pi = K*Ki where Ki-.£ 2 (TTi) — >£ 2 (7ti). 
Let G\{Ki) be the second largest singular value of Ki as an operator on 
l 2 (iTi). Let N > 1 and assume that there are constants C, D > such that 
for 1 < m < N the Nash inequalities 

V/:V->R ll/ll^<cfc m ,. m (/J) + ^ll/ll, 2 2 (. m )) 

(2.15) 

l/D 



Then, for n> m, N > m > 0, we have 
(2.16) 

1 



8Cc 4+3 / g (l+ [4D]) V 
(m + l) J 



n 

X 

m+l 



n i 



1/2 



Moreover, for any n = 2m + u, < m < N , we have 



max< 

x,y 



I < / 8Cc 4+3 / p (i + \w\ ) y D ^ A i 

J V (m + l) / m+1 V 



2Dm+Uf 1_ ffl (^2Nl/2 



c 4 



Proof. Note that the hypothesis that Q is c-stable implies iti/Hj £ 
[l/c 2 ,c 2 ] for all i,j. Consider again the operator Pj and its kernel 

p i( x ,y) = —^—^^^i{z)Ki(z,x)Ki(z,y). 

mix) ^ 



MERGING FOR INHOMOGENEOUS MARKOV CHAINS 



17 



By assumption 



(j,i(x)Pi(x,y) > c 2 TTi(x) 



^2TTi(z)Ki(z,x)Ki(z,y) 



z 



> c 2 -Ki(x)Pi{x,y). 



A comparison argument similar to the one used in the previous proof yields 
the desired result. □ 

3. Examples involving Nash inequalities. This section describes applica- 
tions of the Nash inequality technique to several examples. All these exam- 
ples are of the following general type. 

(1) There is a basic reversible model (K,tt) on a space V/v (growing with 
N) that is well understood because: 

• We have good grasp on the second largest singular value of (K, it). 

• The model (K,ir) satisfies a good Nash inequality, that is, an inequality 
of the form 



with B,b independent of N and TV — (1 — <tjv) _1 - Here, / ~ g implies 
that there exist constants d,D > such that dg < f < Dg. 
• Together, the Nash inequality and second largest singular value estimate 
yield the mixing time estimate 



where A is independent of N. 

(2) We are given a sequence (Ki)f or a set Qn of Markov kernels on Vjy 
which satisfies: 

• (Ki)f or Qn is c-stable with respect to a measure /io which is either equal 
or at least comparable to tt. 

• The Markov kernels K{ or the elements of Qm are all bounded perturba- 
tions of K in the sense that Ki(x,y)/K(x,y) is bounded away from and 
away from oo for all (x,y) G Vfi. In particular, Ki(x,y) = if and only if 



Under such circumstances, Theorem 2.5 (or Theorem 2.6) applies and 
yields the conclusion that the time inhomogeneous Markov chain associated 
with the sequence Ki under investigation has a relative-sup merging time 
Too(v) bounded by 





t> 



A(l + log + l/>7) 
l-a N 



K(x,y) = 0. 



Too(r/) < 



^'(1 + log+l/r?) 



1 - a N 
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Q-: 




FlG. 1. The asymmetric perturbation. 



for some constant A' independent of N . 

The most obvious basic model is, perhaps, the simple random walk on 
Tj/NTj (with some holding if N is even to avoid periodicity). This model has 
1 — ctjv — l/N 2 and satisfies the desired Nash inequality with D = 1/4. The 
first subsection presents applications to a perturbation of this model. 

3.1. Asymmetric perturbation at the middle vertex. In this example, Vn = 
Z/pnZi is a finite circle. It will be convenient to enumerate the points in Vn 
by writing V N = {-(N - 1), . . . , -1, 0, 1, . . . , (N - 1),N} if p N = 2N and 
V/v = { — N, . . . , —1,0, 1, . . . , N} if pn = 2N + 1. The simple random walk in 
V has kernel 

(3.1) Q {x , y ) = l 1 J 2 ' if = 1, 
v ' vv ,yj \0, otherwise, 

and reversible measure u = — . For any e > 0, define the perturbation kernel 

fe, if (x,y) = (0,l), 

(3.2) A £ (x,y) = l -e, if (x,y) = (0, -1), 

1 0, otherwise. 

For e G (—1/2, 1/2), the Markov kernel Q e = Q + A £ is a perturbation of Q. 
See Figure 1. 

For any fixed < e < 1/2, set 

Q(e)={Q s :5e[-e,e}}. 

We shall see below that Q(e) is c-stable. 

Definition 3.1. Let Sn{z) be the set of all probability measures on Vn 
which satisfy the following two properties: 

(1) for all x S Vn, there exist constants a^ x such that a^ ;X = —a^_ x and 

fj,(x) = (1/pn) + afi,x 



MERGING FOR INHOMOGENEOUS MARKOV CHAINS 19 
(2) for all x G Vjv we have that \a^ x \ < 2e/pn- 

Remark 3.2. Note that we always have a^o = (since —0 = 0) and, in 
the case when p^ = 2N, cl^n = 0. 

Claim 3.3. Let p, G Sn(e) defined above, then for any K G Q{s) we have 
that \xK G Sn(e). 

Proof. Let [i G Sn{s) and K = Q s G Q(e), 5 G [— e,e]. We show that 
\xK has the properties required to be in Sfq(e). 

(1) Any measure /i G Sn can be written as [i = u + where is the 
(nonprobability) measure m^{x) =0^^. A simple calculation yields that 

m^Q(x) = {a^ x -i + a^ x+ i)/2. 

Since a^ x = —a^ t _ x , we obtain that 

m At Q(x) = -m^Q(-x) and m M Q(0) = 0. 

The fact that fiQ = (u + m^)Q = u + m^Q implies that fiQ satisfies property 
(1) in the definition of Sn(£)- To see that fiQ$ G Sn(z) also satisfies this 
property, we note that 

(S(i(0), ifx = l, 

liA s (x) = l-S(i(0), ifx = -l, 
1 0, otherwise. 

It now follows that (J,Q$ G Sn has property (1) in the definition of Sn(e) 
since /j,Qs = n(Q + As). 

(2) We consider the measure /iK. For x ^ { — 1, 1} property (2) of Sn(e) 
follows easily from the fact that \a^ )X \ < 2e/pn and 

fiK(x) = l/p N + l/2(a^ x - 1 + a^ x+ i). 

For x = 1 , we note that 

fiK(l) < m(0)(1/2 + e) + M2)U/2) = 1/pjv + e/pjv + (l/2)a M , 2 
< l/p N + 2e/p N . 

Similarly 

fiK(l) > /i(0)(l/2 - e) + /i(2)(l/2) = 1/p* - e/p N - (l/2)a^ 2 
> l/p N - 2e/p N . 

The proof now follows from the fact that a^K,i = —a^K-i as proved in part 
(1) above. □ 
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Claim 3.4. The family Q(e) is \^2 £ -stable with respect to any fiQ £ 
S N (e). 

Proof. Claim 3.3 implies that for any sequence (-fQ)o° such that K{ £ 
Q £ and any measure //o £ <Sn(£) we have [i n = hqKq^ 6 Sn(e) for all n > 0. 
Note that for any measure v £ Sn(e) we have that 

v(x) = l/pN + a V)X <(l + 2e)/p N and u(x) = 1/pn + a V)X > (1 - 2e)/p N . 
Hence, 

l-2e < nn{x) < l + 2e 

l + 2e _ ^o(^) ~ l-2e" a 

When pn = 2N, the kernels Qs yield periodic chains on Vat- In this case, 
we will study the merging properties of 

Qi azy (e) = {±(I + K):KeQ(e)}, 

that is, the so-called lazy version of Q(e). We set 

Q s = l(I + Qs). 

For any \x S 5jv(e), we consider the kernel 

2/) = 7} , x V] K z )Qs(z, x)Q 5 (z, y), 

which is the kernel of K*K where K = Q s : £ 2 (nQs) ^ 2 (/-*)■ This is unless 
y = x,x±l,rr±2 and we compare it to 

P(x, y) = P ,u(x, y) = —3— V u(z)Q(z, x)Q{z, y) 
u(x) ^-^ 

x ' 2 

which is 3/8 if y = x, 1/4 if y = x ± 1, 1/16 if y = x ±2 and otherwise. 
The definitions of Q5 and 5jv(e) yield 

M Q 5 (x)P v (x,y) > ii^Kl^! u (x)P(x,y) 



This yields 

(l + 2j) 
(1 - 2s) 



(3-3) £p,„(/,/) < -k^hds^ ^ t {f,f), 
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whereas the stability property implies that the relevant measures [iQ s and 
u satisfy 

/ \ (l-2e) (l + 2e) 

In the case when pn = 2 A + 1, we may work directly with the kernels Qs 
as they are not periodic. An analysis similar to that above will give versions 
of (3.3) and (3.4) for Q s . 

Applying the line of reasoning explained at the beginning of this section 
and using Theorem 2.6, we get the following result. 

Theorem 3.5. Fix e G (0,1/2). For any rj > the total variation rj- 
merging time of the family Qi azy (e) on Vn = Z/2AZ [resp., Q(e) on Vn = 
Z/(2N + l)Zy is at most B(e)N 2 {l +log + l/ry) for some constant B(e) £ 
(0,oo). In fact, we can choose B(e) such that 



Vn > B(e)N 2 {l + log + l/rf) max \ 

x,y&v N y 



K ,n(x,z) 



Ko, n (y,z) 

for any sequence K{ £ Qi azy (e) [resp., Ki G Q{s)J. 



3.2. Perturbations of some birth and death chains. In [29], Nash inequal- 
ities are used to study certain birth and death chains on Vn = {— N, . . . ,0, 
. . . , N} with reversible measures which belong to one of the following two 
families: 

TT a (x) = c(a,N)(N- \x\ + l) a , a>0, 

and 

ir a (x) = c(a,N)(\x\ + l) a , a>0. 

Here, we consider a G [0, oo) to be a fixed parameter and are interested in 
what happens when tends to infinity. From this perspective, the normal- 
izing constants c(a,N),c(a,N) are comparable and behave as 

c(a, N) ~ c(a, N) ~ A _Q_1 . 

Set 

N r 1, ifa>l, 

C(a,A) = ^(l + i)- Q ~hogA, ifa = l, 

o U- a+1 , if a G [0,1). 

Here, all ~ must be understood for fixed a and the implied comparison 
constants depend on a. Let M a (resp., M a ) be the Markov kernel of the 
Metropolis chain with basis the symmetric simple random walk on Vn with 
holding 1/3 at all points except at the end points where the holding is 
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2/3, and target Tt a , (resp., n a ). Let X(a,N), X(a,N) be the corresponding 
spectral gaps. Let T(a,N,r/), T(a,N,rj) be the relative-sup mixing times of 
these chains. It is proved in [29] that 

X(a,N) ~ 1/N 2 , f(a,N,ri) ~ iV 2 (l + log + l/ ?? ), 

whereas 

\(a,N)~c(a,N)/C(a,N), 
f(a,N,rj) ~ (iV 2 + [c(a,JV)/C(o,JV)]Iog + l/77). 

Note that 

r j V -(l+a) j if a >l, 
6(a,N)/C(a,N)~ I (N 2 logJV) -1 , if a = 1, 

[a^- 2 , if a £ [0,1). 

These results are based on the Nash inequalities satisfied by these chains. 
Namely, letting £ a = £,~ r ~ or £ a = £ ,> - and 7TQ, = 7r a or 7TQ, = n a , there 
are constants A a ,a a E (0,oo) such that 

ll/llS^" < A a ^£ a (fJ) + ^ll/ll^) ll/II^I) 

with D Q = l + a. See [29]. 

In cite [32], the authors consider the class of birth and death chains Q 
on V/v = {—-/V, . . . , 0, . . . , N} that are symmetric with respect to the middle 
point, that is, satisfy Q(x, x + 1) = Q(— x, —x — 1), Q(x, x — 1) = Q(—x, —x + 
1), Q(x,x) = Q(— x, — x), x G {0, iV}. For any such chain Q, let f be the 
reversible measure. It satisfies v{x) = v(—x). Consider the perturbation set 

Qjv(Q,e) = {Q + A s :se [-e,e]}, eG [0,q Q ), 

where qo = Q(0, ±1), A s (0, ±1) = ±s and A(x,y) =0 otherwise. These per- 
turbations at the middle vertex have reversible measure v s that satisfy 

i/ a (0) = i/(0), v s (±x) = v(±x)(l±s/q ), xe{l,...,N}. 

The main point of this construction is the following. 

Proposition 3.6. Fix Q, u as above and ee [0,q ). The set Q N (Q,e) 
is c-stable with respect to fiQ = u with c = (qo + e)j (qo — e) . 

In order to apply this results to our example M a , M a , we observe that 

&(«) = 1^(0, -!) = -(— J 

and 

q (a)=M a (0,-l) = l. 
Now, Theorem 2.6 yields the following result. 
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Theorem 3.7. Fix a G [0,oo) and set i Nja = ^(N/(N + l)) a , en.q = 
1/6. 

1. There exists a constant A independent of N such that, for any sequence 
(Ki)f with Ki G Qat(Mq,, £n,o), we have 

1^(77) <.4iV 2 (l + log + I/t?). 

2. There exists a constant A independent of N such that, for any sequence 
(-fQ)f° with Ki G Qn{M , en,o), we have 

( N 2 + N 1+a \og + l/r], ifa>\, 
TMKAl jV 2 + (JV 2 logJV)log + l/r/, ifa = l, 

(N 2 (l+log + l/ V ), if a G(0,1). 

4. Logarithmic Sobolev inequalities. This section develops the technique 
of logarithmic Sobolev inequality for time inhomogeneous finite Markov 
chains. It should be noted that the logarithmic Sobolev technique has been 
mostly applied in the literature in the context of continuous time chains. 
In [21], Miclo tackled the problem of adapting this technique to discrete 
time (time homogeneous) chains. There are two different ways to use log- 
arithmic Sobolev inequality for mixing estimates. One, the most powerful, 
provides results for relative-sup merging and is based on hypercontractivity. 
The other is based on entropy and only produces bounds for total variation 
merging. We will discuss and illustrate both approaches below in the context 
of time inhomogeneous chains. The entropy approach is already treated in 

m. 

4.1. Hypercontractivity. Recall that, for any positive probability distri- 
bution fi, a Markov kernel K can be thought of as a contraction 

K fl :e 2 (fj,') -> £ 2 {fi) for yl = fiK. 

The adjoint K*:£ 2 (n) ->■ £ 2 (p') has kernel 

K ^i.x,y) = — . 

Set P = K^K^ :£ 2 (p') ->■ £ 2 {y')- We define the logarithmic Sobolev constant 

1{P) = inf I £ £(p$ ■ £U\ M') + 0, / ± constant I , 

where the I 2 relative entropy £(f 2 ,v) of a function / with respect to the 
measure v is defined by 



£(/V) = £/ 2 iog(^^W 

xe y VIUII*»(„)/ 
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The following proposition is a slight generalization of [21], Proposition 2, 
in that it allows for the necessary change of measure. 

Proposition 4.1. Let K and fj, be a Markov kernel and a probability 
measure, respectively. For all qo>2 and q<[l + l(P)]qo, then 

\\^\\£io(fj,')^ei(fi) < 1- 

In order to prove the proposition above, we will need the following two 
lemmas from [21]. 

Lemma 4.2 ([21], Lemma 3). Let v be a probability measure. For all q > 
<?o > L 

- II/IUh < ^ll/llJ^A/^V)- 

LEMMA 4.3 ([21], Lemma 4). Fix u>0 and q>2, then for any t>0 
and —t<s<vt we have that 

(t + sf >ti + qt^ l s + g{q, u)((t + sfl 2 - t q ' 2 f , 

where 

{l + vf-l-qv 
5( ^ )= ((1+^-1)2- 

The proof of Proposition 4.1 follows directly that of Proposition 2 in [21]. 

Proof of Proposition 4.1. To prove Proposition 4.1 is suffices to 
only consider positive functions. For / > 0, we begin by writing 

11*711*00 " = ll*7H*oo - ll/llw 

(4.1) 

+ 11/11*00 - ll/II^V)- 
The difference of the last two terms on the right-hand side is controlled by 
Lemma 4.2. To control the first two terms, we will use the concavity result 

Va,6>0 a 1 / 9 -6 1 ^<-6 1 / g - 1 (a-6). 

q 

It follows that 

11*711*00 " ll/W) < ^ll/ltedl^H^M - II/IIW))- 

Set 

v(K)=max.{l/K(x,y):K(x,y) > 0} - 1. 
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Following the notation of Lemma 4.3, fix x,y £ V and set v = v(K), t = 
Kf(x) and t + s = f(y). If K(x, y) > 0, then —t<s<ut and so 

f(yf > Kf(xf + qKf(xy-\f(y) - Kf{x)) 

+ g{q,v{K)){f{yyl 2 -Kf{xyl 2 f. 

Fix x and integrate with respect to the measure K(x, •) to get 

Kf(x) > (Kf(x)y + g(q, v{K)) £ K(x, y)(f(y^ 2 - Kf( X y/ 2 ) 2 . 

y€V 

We also have 

^K(x,y)(f<' 2 (y) - {Kf{x)y/ 2 f>^Y J K{x,y){rl\y)-cf 
y ev yev 

= Y,K{x,y){fi/\y) - K(f/ 2 )(x)) 2 
yev 

= Kr(x)-(Kr/ 2 (x)) 2 . 

Hence, 

Kf(x) > (Kf(x)y+g(q,v(K))(Kf«(x) - (Kf^ 2 (x)) 2 ). 
Integrating with respect to fi gives us that 

(4.2) > \\Kf\\% {p) +g(qMK))£p,Af q/2 J q/2 )- 

It follows from Lemma 4.2, (4.1) and (4.2) that 

\\Kf\\#M-\\f\\PM 

In [21], it is noted that for all v > and q > 2 we have g{q,v) > 1. So if 
q < [1 + Z(P)] go then g < [1 + <?(g, i/(if))Z(P)](R). Hence, 

\\Kf\\*M ~ ll/llw 

< | ll/[ll^>*y(<r, K-eOXK-P^C/" 72 . a»0 - ftwC/*^, / ff/2 ))- 

Since Z(-P) is the logarithmic Sobolev constant, we get our desired result, 

\\Kf\\#M-\\f\\0W<O. □ 

Corollary 4.4. Let (if n )o° 6e a sequence of Markov kernels on a fi- 
nite set V and /xq be an initial distribution on V. Set \i n = hqKq^. Consider 
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Ki:P(jii)-t&(pi-i) and P i = K*K i :£ 2 (n l )^£ 2 (^ l ). Let I (Pi) be the log- 
arithmic Sobolev constant of Pi. Then for any qo > 2 and q < ]Xi=i(^ + 
l(Pi))qo, we have that 

ll-^O.nll^o^)^^) < 1- 

Proof. When n = 2, set gi = (1 + Z(P 2 ))<? , then q = (1 + Z(Pi))gi. It 
follows from Proposition 4.1 that 

ll-^O^H^o^a)^^) < ||-^2||^o( M2 )-)-^i( Ml )||ifi||^i( Ml )^92(/io) - 1- 
The proof by induction follows similarly. □ 

We now relate the results above to bounds on merging times. 

Theorem 4.5. Let V be a finite set equipped with a sequence of Markov 
kernels (K n )^ and an initial distribution [iq. Let \i n = /xo-Ko,n- Consider Ki : 
£ 2 (Hi) -^ 2 (/ii_i) and P f = K*Ki:£ 2 (m) ->• £ 2 (m). Let Z(P) be the logarith- 
mic Sobolev constant of Pi. Set 



m. 



minjt € N:£log(l + Z(P)) > loglogGuo^)" 172 ) j- 
Then for n > m x , we have that 

n 

d 2 (K 0jn (x,-),^ n ) 2 <e 2 (Ti(Ki, m-i) 2 . 



i=m x +l 



Proof. Fix x, and let m = m x . If < m < n, Kq n = K^ n K^ m . Indeed, 
for any / £ £ 2 {^o) and g £ £ 2 (fJ- n ) we have that 

Moreover, if [i rn is thought of as the expectation operator fi m : £ 2 (fi m ) — > 
£ 2 (/j, n ), f >-» Mm(/), then (#* - = -FT min - // n . Let 



A*o(z) \ 



if z = x, 
otherwise. 



Set q = q(m) = 2n^Li(l + K^i)) an d q'(m) to be the conjugate exponent of 
q{m) so that l/q(m) + l/q'(m) = 1. By duality, we have 



d2(-Ko,n(av),Atn) 



^ 2 (/i„) 



*o,n(-.a) 
/xoO) 



^n) 
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= \\( K 0,n ~ ^SxWp^n) = \\{Km,n ~ ^m)#o,rrA ll^ 2 (M 

< \\ K m,n ~ Vrn\\p(p, m )^P( lin )\\KQ tm 5 x \\p^ m - ) 

< ll^x-||^^m)( w )||^o,mll£9^m)( Mo )^2( Mm -)||-K"m,7i ~~ /^mll^O™)^ 2 ^) 

< ^o(x)~ 1/q ^ m ^\K 0>m \\p^ m ^ q (m)^\\K m>n - Vn\\p(»n)^P(iJ, m )- 

By assumption, we have that q(m) > log(/io(^) 1 ), it now follows from 
Corollary 4.4 that 

n 

d 2 (K 0;n (x, -),A*n) < e o-i(Kj,/ii_i). 

i=m+l '— ' 

4.2. Logarithmic Sobolev inequalities and c- stability. 

Theorem 4.6. -Fir cG (1,oo). Let V be a finite set equipped with a 
sequence of irreducible Markov kernels, {Ki)f . Assume that (-fQ)i° ^ s c ~ 
stable with respect to a positive probability measure [1$. For each i, set ji^ = 
[i§Ki and let a\{Ki,jjLo) be the second largest singular value of the operator 
Ki'.£ ((j,q) — > ^ 2 (/io) an d l(K*Ki) the logarithmic Sobolev constant for the 
operator KfKi-.i 2 ^) £ 2 (/4). If 

rh x = minjt G N: ^log(l + c~ 2 Z(Jf *iQ)) > loglog(/i (x)" 1/2 )|, 

then for n > m x we have that 

n 

d2(K , n (x,-),Vn) 2 <e 2 11 (l-C- 2 (l-ai(^,^ ) 2 )). 

i=m x +l 

Proof. First, we note that ^i/^Q £ [c , c]. Let Pi be the Markov kernel 
described in the statement of Theorem 4.5. By the same arguments as in 
Theorem 2.5, we get that for all x,y G V 

Hi(x)Pi(x,y) = ^2m- 1 (z)Ki(z,x)K i (z,y) > c" 1 fS (x) K* K^x , y) . 

z 

A simple comparison argument similar to those used in the proof of Theorem 
2.5 (see also [10, 12]) yields that 

l{P l )>c^ 2 l{K*K i ) and 1 - a{K h > (T 2 (l - a(K h ^) 2 ). 

The first inequality implies that rh x > m x where m x is defined in the proof 
of Theorem 4.5. Using the results of Theorem 4.5 and the second inequality 
above gives the desired result. □ 

The next result is when we have a c-stability assumption on a family of 
kernels. 
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Theorem 4.7. Let cG (l,oo). Let Q be a family of irreducible aperiodic 
Markov kernels on a finite set V . Assume that Q is c-stable with respect to 
some positive probability measure fiQ. Let (Ki)^° be a sequence of Markov ker- 
nels with Ki G Q for all i. Let Hi be the invariant measure of Ki. Let o~i(Ki) 
be the second largest singular value for the operator Ki :£ 2 (tt) — > £ 2 (tt). Let 
l(K*Ki) be the logarithmic Sobolev constant for the operator K?Ki where 
K* is the adjoint of Ki : £ 2 (ir) -> £ 2 (ir). Lf 



m. 



minjt G N:^log(l + c~H(K*Ki)) > loglog(/i (x)" 1/2 )|, 
then for n > m x we have that 

n 

d2(K , n (x,-),fi n ) 2 <e 2 [] (l-c-^l-^^) 2 ))- 

i=m x +l 

Proof. Let fii = /Uo-Ko,«- If Q is c-stable, then iiiji^i G [c~ 2 ,c 2 ]. Similar 
arguments to those used in Theorem 4.6 give the desired result. □ 

4.3. The relative sup norm. To control the relative-sup merging time 
by this method, we need an additional hypothesis. In the case of the £ 2 
distance, we only required a control over the logarithmic Sobolev constant 
of the kernel Pi = K*Ki :£ 2 (^i) — > £ 2 (fii). In this case, we will also need to 
control the logarithmic Sobolev constant of Pi = K^K* :£ 2 (p,i_i) — > l 2 (pn-\) 
where K* is the adjoint of the operator Ki from £ 2 (^i) to £ 2 ([ii-i). 

Theorem 4.8. Let V be a finite set equipped with a sequence of Markov 
kernels (Kn)^ and an initial distribution /io- Let /i n = ^o-Ko,n an d Pi = 
K*K, : £ 2 (fii) e 2 (m) and Pi = KiK* : £ 2 (^i) £ 2 (^i) where K* is the 
adjoint of Ki with respect to the measure [ii. Let I (Pi) and l(Pi) be the 
logarithmic Sobolev constants of Pi and Pi, respectively. If [if = min x {ni(x)} 
and 

m* = min G N : log(l + 1{P$) > loglog(^#- 1/2 ) | , 



ml 



i# = min|tGN: log(l + Z(P 4 )) > loglog^*" 1 / 2 ) 
then for any n > 2m, 



i=n—t 



max 



Ko,n(x,y) 



fj>n(y) 



<e 2 ai(K U fjii-i), 



i=m+l 



where m = max{m*,mf}. 
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Remark 4.9. This innocent looking theorem is not easy to apply. For 
instance, m depends on n and without some control on this dependence the 
result is useless. 



Proof of Theorem 4.8. Write 

= \\K(),n - ( i n\\£ 1 (jin)-^i°°(jio) 



max 

x,y 



K , n (x,y) _ 



and 

\\Ko,n — VnWe 1 ^)^ 00 ^) 

— \\^n-m,n — ^n\\e 1 (fj, n )^e 2 ( f i n _ Jn ) x \\K m ,n-m ~ ^n-m\\fi(^ n _ m )^ r t 2 (^ m ) 

x \\Ko,m — VmWpQim^ioobio)- 
Note that 

n—m 
i=m+l 

so we just need to bound the remaining terms in the right-hand side of the 
inequality above. To bound \\K n - m>n - Mn||^i( /in )->.^( Mn _ m ) set q* = q*(m) = 
2 WT=i( l + KPn-m+i)) and write 

||-K"ri-m,n ~~ ^n\\^{^ n )^,p(pi n _ m ) 

= \\Kn-m,n ~ f J "n-m\\p(fj, n ^ m )^ a °( f j, n ) 

= \\I(K n -m,n ~ ^n-m)\\e 2 (ji n - m )-^£°°(p n ) 

- \\ K n-m,n ~ Mn-m||^( Mn _ m )_ >fg * ( Mn ) \\I\\eq* Ou„)-^°°Gu„) 

- 1 1 K n-m,n \\p (/i n _ m )->-K" (fi. n ) ¥ Wtl* (pn)->e°° (fin) ' 

It follows from Corollary 4.4 that 

\\K n - min - fJ"n\\fi(n n )^(iJ, n - m ) ^ ll-nU*( Mn )^°°( Mn ) < A l/<1 ■ 

By assumption, we have that q* = q*(m) > log(^n _1 ) so we get 

\\Kn-m,n ~ ll^n)-^ 2 ^-™) < e. 

To bound \\K 0jm - Moll^^)^^) set <? = q(m) = 2n™i(l + and 
write 

It follows from Corollary 4.4 that 

||-K"o,m - Moll^^)^^^) < Pll^o)-^ 00 ^) < /i Q ~ lq . 
Since 3 = q(m) > log^" 1 ) we get ||-K n _ m , ri - Mnll^^)^^^) < e. □ 
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Theorem 4.10. Fix cG (l,oo). Let V be a finite set equipped with a 
sequence of Markov kernels (K n )f . Assume that (K n )f is c-stable with re- 
spect to a positive probability measure jjlq. For each i, set fj, l = {i§Ki and 
[i l n = fi n Ki. Let <Ti(-KTj, /Ug) be the second largest singular value of the opera- 
tor Ki :£ 2 (//q) — > £ 2 (fj,o). Let 1{K*K{) be the logarithmic Sobolev constant of 
the operator K*Ki :£ 2 (fJ l ) — ^ 2 (/ig) where K* is the adjoint of Ki :^ 2 (/ig) -> 
£ 2 (Hq). Let l{KiK^) be the logarithmic Sobolev constant of the operator 
KiK*:£ 2 (fi n )^£ 2 (fi n ) where K* is the adjoint of Ki :£ 2 {^ n ) -> £ 2 (n n )- If 
tf[ = rain x {/jLi(x)} and 



mini t G N: J^log(l + c~ 2 l{K*Ki)) > loglog^*" 1 / 



i=l 



m#=min<^GN: log(l + c" 6 /^*)) > loglog(^f ~ 1/2 ) >, 

V. i=n—t ) 

then for any n > 2m 



max-, 

x,y 



f Ko^l e2 jj (1 _ c _ 2(1 _ ai{Ki ^ ?)) y^ 

^n{y ) J i=m 

where rh = max{m* , } • 

Proof. Note that /J,i/fJ, l G [c -1 ,c] and fn/lAi G [c~ 2 ,c 2 ]. Let Pj and Pj 
be the Markov kernels described in Theorem 4.8 with kernels 

(4.3) Pi(x,y) = — r ^y2m-i(z)K i (z,x)K i (z,y), 

(4.4) Pi(x, y) = J2 ^r^K^x, z)Ki(y, z). 

z ^t\ z ) 

Similar reasoning to that of Theorem 4.6 gives 

l(Pi)>c~ 2 l(K*Ki) and 1 - <t(P;) 2 > <T 2 (1 - a{K u /i*,) 2 ), 

where K* above is the adjoint of Ki:£ 2 (fi l ) — > £ 2 (/j,q). This implies that 
tHq > t?Iq where m$ is defined in Theorem 4.8. 
In the case of Pj, equation (4.4) gives 

Pi > c- 4 J2 ^TT\ K i( x > z ) R i(y> *) = c- A K t K*(x, y), 

z l 1 ri\ z ) 

where K* is the adjoint of the operator Ki \£ 2 {iJL l n ) — > £ 2 ({i n ). A simple com- 
parison argument yields 

l{Pi)>c~%KiKt) 
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and so fhn > m» where mj is defined in Theorem 4.8. The desired result 
now follows from Theorem 4.8. □ 

The next theorem gives us similar results when we have c-stability for a 
family of kernels. 

Theorem 4.11. Fix c G (l,oo). Let Q be a family of irreducible aperi- 
odic Markov kernels on V . Assume that Q is c-stable with respect to some 
positive probability measure fj,Q. Let (ET n )J° be a sequence with Ki G Q for all 
i > 1. Let 7Tj be the invariant measure of Ki and a\{Ki) the second largest 
singular value of the operator Ki acting on l 2 {jTi). Let l(K*Ki) and l(KiFC*) 
be the logarithmic Sobolev constants of the operators K*Ki and KiK* where 
K^ is the adjoint of Ki : £ 2 (-7Tj) — > I 2 (7r,) . Lf fif = min x {fj,i(x)} and 



= min<j t G N : ^ log(l + c- 4 l(K*Ki)) > log log (fif~ 1/2> 

i=l 



rh# =mmh£N: log(l + c~H(KiK*)) > log log (/i*" 1 / 2 ) \ , 

L i=n—t ) 

then for any n > 2m 

max — 1 > < e M (1 - c (1 - <7i(iQ) )) 7 , 

** 1 WW J i=rh 

where m = max{m* , fhn } • 

Proof. First, note that /Ui/7Tj G [c _2 ,c 2 ]. Equation (4.3) implies that 

l{Pi)>c- A l{KtKi) and 1 - a(J^, ^) 2 > c~ 4 (l - a{Ki) 2 ). 

To bound Z(-Pj), we use (4.4) to get that for all x,y G V 

P i (x,y)>c~ i K i K*(x,y). 

This implies that l(Pi) > c~ e l(KiK*). It follows that m > m where m is 
defined in Theorem 4.8. Applying Theorem 4.8 now gives us the desired 
result. □ 

4.4. An inhomogeneous walk on the hypercube. Denote by V = {0, l} 2 ^ 
the 2iV-dimensional hypercube, we say that x,y G V are neighbors, or x ~ y 
if 

N 



~Vi\ = !> 



i=i 
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where Xi is the ith coordinate of x G V. The simple random walk on V is 
driven by the kernel 



K(x,y) 



1 f 

— — , if x ~ y, 

2N' 



0. 



otherwise. 



It is easy to check that fi, the uniform measure on V, is stationary for K. 
Fix e G (0, 1) and consider the following perturbed version of K. 



For e G (0,1), set 



1 

2iV' 
1 + e 

2N 
1-e 

2N ' 
,0, 



if x ~ y and ^ N, 

if x ~ y and |x| = N,y = \N\ + 1, 

if x ~ y and |x| = N,y = \N\ — 1, 
otherwise. 



Q(e) = {K 5 :5G [-e,e]}. 



The example of time inhomogeneous Markov chains associated to Q(e) above 
is related to the binomial example in [32]. Sec Remark 4.17 below. 

We shall show that Q(e) is c-stable. First, consider the following defini- 
tion. 



Definition 4.12. Let S2N be the set of probability measures on V = {0, 
l} 2 ^ that satisfy the following three properties: 

(1) For all x G V with \x\ = N we have v{x) = 

(2) For all i G {— N, . . . , —1, 1, . . . , N} there exists constants a v ^ such that 
a Ut i = —a v -i and for any x with |x| = N + i we have 

(3) For all i G {-N, . . . , -1, 1, . . . , N} we have \a Vii \ <e/A N . 

Claim 4.13. Let v be in S2N defined above, then for any K G Q(e) we 
have that vK G Sin ■ 

Proof. Let v G S2N and Q G Q(e), then Q = K$ for some 5 G [—£,£]. 
We will check each condition needed for uQ to be in S2N separately. 

(1) For any x with |x| = N we have that vQ{x) = vK(x). The desired 
result now follows from the definition of S2N ■ 
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(2) For i such that \i\ ^ {1,N}, consider an element x such that |x| = N + i. 
Then 



uQ{x)= u (y)Q(y> x )+ Y u (y)Q(y^ x ) 

y~x y^x 

|y|=l*H-i |v|=W-i 



2N J 



yr^x y~x 
\y\=\x\+l \y\ = \x\-l 

4JV +a>v,i+i)\x\ + (^7 + 



= ^N + gjyK.i+lNI + a^_i(2iV - |z|)). 

A similar computation as above yields that for an element x with |x| = N — i 
we have 

vQ{x) = J^( a »,i+i\ x \ + a u ,i-i(2N - \x\)). 

When i = N, and x is such that \x\ = N + i = 2N, we have 
vQ(x) = ^ v{y)Q(y,: 
\y\=2N-l 



,x 



1 

= 4JV + a ",N-l- 

When i = — N, and x is such that |x| = N — i = we get vQ{x) = — a^jv-i 
as desired. 

Finally, we check that cases for elements x with |x| = N ± 1. Consider an 
x such that |x| = N — 1, then 



uQ(x)= v(y)Q(yi x )+ u (y)Q(y> 

y^x y^x 
\y\=N-2 \y\=N 

1 1 Z' /»r ^ *(jv + i; 
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When |x| = N + 1, then 

vQ(x) = ^2 v(y)Q(y,x) + ^ u(y)Q(y,x) 

\y\=N \y\=N+2 

as desired. We can now concluded that a v Q t i = —a v Q-i. 

(3) From the calculations in part (2), we know that for x with x = N + i 
and |i| ^ {1,N} and \i\ = N we have 

vQ(x) = Jn + 2jy(oi>,i+i|z| + a u ,i-i{2N - \x\)) 

and 

vQ{x) = jx+av,N-i, 

respectively. It follows from the fact that for all i, \a u ^\ < e/A N that for 
both cases above |cti/Q,i| < When \i\ = 1, we have that for x with 

\x\ = N + i = N ±1 

1 1 f e(N-l) e(N + l) \ 
~4 N ^2N\ A N + 4^ J 

1 + g 

A similar calculation yields uQ(x) > The proof now follows from the 
fact that a u Q :i = —a v Q-i. □ 

Claim 4.14. The set Q(s) is j^-stable with respect to any measure in 

$2N- 

Proof. Let no E S2N- Let (ifj)f 3 be any sequence of kernels such that 
Ki £ Q(e) for all i>l. Let /i n = noKo,n, then by Claim 4.13 we have that 
[in G ^2N and so for any x £V 

1 - £ < < 1 +£ 

1 + e ~ ^io(ac) ~ l-e' n 
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The kernels K$ € Q(e) drive periodic chains that will alternate between 
points with an even number of l's and odd number of l's. So we will study 
following random walk driven by the kernel 

Qs = \{I + K s ), 

where I is the identity. Set 

Q(e) = {Q s :6 €[-e,e]}. 



Claim 4.15. Let (-fQ)i° be a sequence of Markov kernels such that Ki G 
Q(e) for all i>l. Let [1q £ S2N be a positive measure, and let fi n = [iqKq^. 
Set Pi = K*Ki:£ 2 (m) ->■ £ 2 (m) where K* is the adjoint of Ki:£ 2 (fj,i) ->• 
£ 2 ([ii-i). Let o~i(Ki,fii) and be the second largest singular value of K t : £ 2 {ni) - 
£ 2 (Hi-i). Let l(Pi) be logarithmic Sobolev constant of Pi. Then 

ai(^, W )<l-C(e)^ and l(P t )>^±, 
where C(e) = (1 + e)~ 2 (l - e) 4 . 

Proof. Let Q = 2" 1 (J + Kq) and u be the uniform measure on {0, 1} 2N . 
Let Pi(x,y) = K*Ki :£ 2 {\ii) — > £ 2 {\ii). Using the ^-stability of the sequence 

(Mn)o°) we S e t t na * 

fii(x)Pi(x,y) = y~] Hi-i {z)Kj (z, x)Kj (z, y) 

z 

^ I ~ £U ^\ Y,<z) K iM K i(z>v) 

1 + £ U(X) ' 

y ' z 

^2u(z)Q(z,x)Q(z,y) 



(1 — e) 3 u{x) 
~ 1 + e u{x) 



> { \^u(x)QW{x,y). 

A simple comparison yields 

Spm (/, /)>(!- ef(l + e)" 1 «f: Q ( 2 ) i J/, /)• 
Further comparison gives that 

(4.5) 1 - <ri{Ki,m) > C(e)(l - ai(Q)), 

(4.6) m)>C(e)l(Q^). 
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It is well known that for Kq : £ 2 (u) — > £ 2 (u) (the simple random walk) we 
have 2l(K ) = 1 - <ti(K ) = 1/N. This implies that <n(Q) = 1 - 1/2 JV. The 
singular value inequality in Claim 4.15 now follows from (4.5). For the rest 
of the proof, we note that Lemma 2.5 of [11] tells us that £q{2) „(/>/) > 
£Q,u(f,f), and so we get 1{Q^ 2 ') > l(Q)- The logarithmic Sobolev inequality 
now follows from (4.6) and the fact that l(Q) = 1/AN. □ 

By applying Theorem 4.5 and Claim 4.15, we get the following theorem. 

Theorem 4.16. For any e E (0,1) there exists a constant D(e) such 
that the total variation merging time of the sequence (-fQ)f° with Ki G Q(e) 
for all i G {1, 2, . . .} is bounded by 

T T y{r,) < D(e)N {\og N + log + 1/r/). 

Moreover, we can chose D(e) such that 

Vn > D(e)N(logN + log, max 

x,y,zeV 

We note that the relative-sup merging time bound is obtained with the 
same arguments as those used at the end of the proof of Theorem 2.4. 

Remark 4.17. The theorem above is closely related to the example in 
Section 5.2 of [32] which studies a time inhomogeneous chain on {— N, . . . , N} 
resulting from perturbations of a birth and death chain with binomial sta- 
tionary distribution. Both [32] and Theorem 4.16 give the correct upper 
bound on the merging time yet [32] requires knowledge about the entire 
spectrum of the operators driving the chain while the theorem above uses 
logarithmic Sobolev techniques. 

4.5. Modified logarithmic Sobolev inequalities and entropy. Let v and 
H > be two probability measures on V. Define the relative entropy between 
H and v as 

Ent>) = J>(z)log(^MY 

x&V \ \ )/ 

It is well known that \/2||/i — ^||tv ^ \/Ent^(/i). Let (K n )^ be a sequence of 
Markov kernels on V, /io be some initial distribution on V and [L n — /xo^o,n* 
It follows by the triangle inequality that for any x,y G V 

||#o,n(av) - ^o,n(y,-)l|TV < v / 2maxA/Ent Atn (Ko i „(x,-)). 



Ko :n (x,z) 
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Let a = a(K, v) be the largest constant such that for any probability 
measure p 

Ent uK (pK) < (1 - a)Ent„(p). 

Let p! = pK and K* : £ 2 (p) -»• £ 2 (p') be the adjoint of K : £ 2 {p') -»• £ 2 (/i). Set 

P = M*:f 2 (/i)^f 2 (/i). 

In [7], the contraction constant a is related to the so-called modified loga- 
rithmic Sobolev constant 

Z'(P) = inf | £ ^ p{ f {f l°f ) f2)) ■■£(f,p)^0,f^ constant | . 



Proposition 4.18 ([7], Proposition 5.1). There exists a universal con- 
stant < p < 1 such that for any Markov kernel K and any probability mea- 
sure p, 

pl'(P)<a(K,p)<l'(P), 

where P = KK* and K* is the adjoint of the operator K :£ 2 (p') — > £ 2 (p), 
p! = pK. 

Proposition 4.19. Referring to the proposition above, 

/l-lo K 2 N 

p > log 2 



2 

Proof. The proof of Proposition 5.1 in [7] uses the fact that there exists 
some < p < 1 such that for all x £ [—1, oo) 

0<ip(x)<p~ 1 cp(x/2), 

where 

ip(x) = (1 + x) log(l + x) — X. 

Let f(x) = <p{x)- (2/(1 -log 2))ip(x/2). We will show that for all x G [-l,oo) 
then f(x) < 0. By differentiating / we get 



fix) = log(l +x)- ( 1 _i og2 j lo S 



2 + x 



and 

41og2(l + x) + x 2 log2 - 3 - 2x 



r{x) 



(l + x) 2 (2 + j;) 2 (l-log2) 
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In particular, for x € [—1,0] we have f"'(x) < 0. This along with the fact 
that 

/'(-0.9)<0, /'(-0.1)>0 an d /'(()) =0 

implies that there exists only one z G (—1,0) such that f'(z) = 0. It follows 
that / is decreasing on [— l,z] and / is increasing on [z,0]. Since /(— 1) = 
/(0) = 0, then for x £ [-1,0] we have that f(x) < 0. 
For x G [0,oo), we note that 

f"( x ) = — < n 

1 1 ' 1+x (l-log2)(2 + x) 

which implies that f'(x) < /'(0) = 0. The fact that f(x) < /(0) = implies 
p = 2/(1 — log 2). The desired result follows from the fact that the proof of 
Proposition 5.1 in [7] shows that 

a(K,fi)>plog(2)l'(P). □ 

The results in [7] allow us to study merging via logarithmic Sobolev con- 
stants. 

Proposition 4.20. Let V be a finite state space equipped with a se- 
quence of Markov kernels (K n )^° and an initial distribution ^q. Let p n = 
PoKo,n and Pi = KiK* : ^ 2 (/ij_i) ->■ £ 2 (/x,;_i) where K* is the adjoint of Ki : £ 2 (pi 
^ 2 (/ij_i). Set /ig = min x po(x) then for any x , y £ V 

( 1 \ 1/2 n 

\\K , n (x, •) - Ko, n (y, OIItv < v^log — T(l - pl'(Pi)) 1/2 , 
where p is given in Propositions 1^.18 and 1^.19 . 
Proof. We note that for any x,y G V 

\\K^ n {x,-) -K Qn {y, -)||tv < V^maxJEnt^^Qnix,-)). 

x,y V 

Proposition 5.1 in [7] gives that 

n 

Ent Mn (K 0>n (x, •)) < Ent w (5 X ) J](l - pi' {Pi)). 

i=l 

The desired result now follows from the fact that 
Ent M0 (4) = logf — ) < log 
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4.6. Biased shuffles. In this section, we present two examples where the 
modified logarithmic Sobolev inequality technique yields the correct merging 
time while the regular logarithmic Sobolev inequality technique does not. 
Let V n = S n be the symmetric group equipped with the uniform probability 
measure u. Let Qi be the the kernel of transpose i with random, that is, 



Qi(x,y) 



1/n, if x 1 y = for j G [1, n], 
0, otherwise. 



Let Qi = 2 1 (J + Qi) be the associated lazy chain. It is known that the lazy 
chain has a mixing time of of 2nlogn. More precisely, 

t>2n(logn + c) max/ ^f'^ - ll < 2e~ 2c Vx G S n . 

x,y { u{y) J 

See, for example, [31]. The results of [18] show that the modified logarithmic 
Sobolev constant for Qi is bounded by 

> l'(Qi) > 



n-\~ v ^ /_ 4(n-l)' 

Set Q = {Qi,i = 1, . . . , n}. Since all Qi are reversible with respect to the 
uniform distribution u, the set Q is 1-stable with respect to u. Using the 
methods of [30] (see also [17, 24]), one can prove that for any sequence (-fQ)i° 
with Ki £ Q for all i > 1 we have 

t>2n(logn + c) =► max/ K °' n ^ ll _ A <2e~ 2c Vx G S n . 

x,y { u(y) J 

The inequality above is due to the fact that the Qi are driven by probability 
measures so the £ 2 distance bounds the l°° distance and the eigenvectors in 
Theorem 3.2 of [32] drop out to give 

n\-l t 

(4.7) d 2 (K , t (x,-),u) 2 < H^(Kj) 2 . 

i=i j=i 

One can then group the singular values in the equality above since the Q^s 
are all are images of each other under some inner automorphism of S n which 
implies (Jj{Qi) = (Jj{Qk) for all i, j, k. For a more detailed discussion, see [30]. 

We now consider two variants of this example that cannot be treated 
using the singular values techniques of [17, 30, 32] or the logarithmic Sobolev 
inequality technique of Sections 4.1-4.4 but where the modified logarithmic 
Sobolev inequality does yield a successful analysis. This technique can be 
applied to the two examples in this section because of the following three 
reasons: 

(1) any sequence (-fQ)i° of interest can be shown to be c-stable with respect 
to some well chosen initial distribution; 
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(2) all the kernels Ki driving the time inhomogeneous process are directly 
comparable to the Qj's and, 

(3) due to (1) and the laziness of the Qi's we can successfully estimate 

the modified logarithmic Sobolev constants I (Q%Q*) = V(Q\ ) to be of 
order 1/n. 

4.6.1. Symmetric perturbations in S n . For the first variant, fix e € (0, 1) 
and consider the set Q^(s) of all Markov kernels K on S n such that: 

(a) K(x,y) = K(y,x) (symmetry) and 

(b) Vx,y we have (1 — e)Qi(x,y) < K(x,y) < (1 + e)Qi(x,y) for some i £ 
{l,...,n}. 

Hence, Q^(e) is the set of all symmetric edge perturbations of kernels in 
Q. As we require symmetry, the uniform distribution is invariant for all 
the kernels in Q#(e). Now, what can be said of the merging properties of 
sequences {Ki)f with Ki G Q*(e)l Unlike Q, the kernels in Q*{e) are not 
invariant under left multiplication in S n . So the eigenvectors of Theorem 3.2 
in [32] do not drop out, and we only get 

t 

d 2 (K 0yt (x,-),u) 2 ^nlHa^Ki,^ 2 . 

i=i 

Singular value comparison yields a\(Ki,u) < 1 — (1 — e)/(2n) which gives 

*> (1 -e)~ 1 n(?7,logn + 2c) d 2 (K ,t(x, ■),u)< e~ c Vx E S n . 

This indicates merging after order n 2 logn steps instead of the expected 
order nlogre steps. For any sequence with Ki £ Q^(e) for all i > 1 

set Pi = KiK* where K* is the adjoint of the operator Ki :£ 2 (u) — > £ 2 (u). A 
simple comparison argument gives 

P l (x,y)>(l-e) 2 Q 2 (x,y) 

for some j G [l,n]. Further comparison yields I 1 (Pi) > (1 — e) 2 l'(Q 2 ). Lem- 
ma 2.5 of [11] implies that l'(Q 2 ) > l'(Qj) so I' (Pi) is of order at least 1/n. 
Hence, there exists some constant C(s) independent of n such that 

||Xo, t (x, •) - K Qtt (y, -)||tv < \/21ogn!(l - C(e)/nf 2 . 

In particular, for some constant D(e) we get Ttv(^) < D(e)n(log?i + 
log + 1/77) . To obtain a result for the relative-sup norm, one can use the 
(nonmodified) logarithmic Sobolev technique as the modified logarithmic 
Sobolev technique only gives bounds in total variation. It is known that the 
logarithmic Sobolev constant for top to random is of order l/(nlogn), see 
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[20], leading to results that are off by a factor of logn. This technique yields 
the best available result, 



* > C(e)n((logn) 2 + c) =>• max 

x,y,z 



Ko, t (x,z) 



Ko, t (y,z) 



< e 



4.6.2. Sticky permutations. We now consider a second variation on the 
transpose cyclic to random example. Let p £ S n , 5 £ (0,1 — Qi(p,p)) and 
consider the Markov kernel 

(Qi(x,y), if x ^ p, 

Qi(x,y) + S, if x = y = p, 

Qi(x,y) — 5/(n — 1), if x = p and x^y = for j 6 [2,n]. 

In words, X is obtained from Qi by adding extra holding probability at p, 
making p "sticky." Next, if a is the cycle (1, . . . ,n), let 

Ki(x,y) = K(a i - 1 xa- i +\a i - 1 ya- i+1 ). 

In words, K{ is Qi with some added holding at pi = cj~* +1 pa 1 " 1 . 

We would like to consider the merging properties of the sequence (K^f 1 . 
Unlike the previous example, the uniform probability is not invariant under 
Ki. However, this type of construction is considered in [33]. 

Let 



£,^Qi(P.*) 

so that K(x,y) > (1 — s)Q\(x,y). It is proved that (i^i)f 3 is (1 — e) _1 -stable 
with respect to the probability measure po = jr, where tt is the invariant 
probability measure of the Markov kernel K(x,y) = K(x,a~ 1 ya). From the 
analysis in [33], Section 5, one can see that 

(l-e)n<7f< (l-£) _1 <u. 

Applying the singular value techniques used in Section 5 of [33] would give 
us an upper bound on the relative sup merging time of order n 2 logn. 

Set Pi = KiK* :l 2 (pi_i) — > £ 2 (pi_i) where K* is the adjoint of the opera- 
tor Ki : £ 2 (pi) —> £ 2 (pi-\). Since Ki(x, y) > (1 — e)Qi(x, y), for x ^ y we can 
write 

Pi{x,y) = ^2Ki(x, z)Ki(y, z)pi^ l (y)p i {z)~ 1 > (1 - e) 4 Q 2 (x,y). 

z 

It follows by comparison that I' (Pi) > (1 — e) 5 l'(Q 2 ). We can successfully 
estimate l'(Q 2 ) due to Lemma 2.5 of [11] which implies l'(Q 2 ) > l'(Qi)- So 



42 



L. SALOFF-COSTE AND J. ZUNIGA 



we have that I' (Pi) is at least (1 — e) 5 /(4(n — 1)). Proposition 4.20 gives us 
that 

\\K , t (x r )-K 0! Sr)\hv<V2io g l^- £ j , 

where p is as in Proposition 4.19. So for some constant D = D(e), we get 
r TV (r/) <Dn(logn + log + (l/ri)). 
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